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Summary 

Calculating  the  required  sample  size  for  a  desired  power  at  a  given  type  I  error  level,  we  often 
assume  that  we  know  the  exact  time  of  all  subject  responses  whenever  they  occur  during  our  study 
period.  It  is  very  common,  however,  in  practice  that  we  only  monitor  subjects  periodically  and, 
therefore,  we  know  only  whether  responses  occur  or  not  during  an  interval.  This  paper  indudes  a 
quantitative  discussion  of  the  effect  resulting  from  data  grouping  or  interval  censoring  on  the 
required  sample  size  when  we  have  two  treatment  groups.  Furthermore,  with  the  goal  of  exploring 
the  optimum  in  the  number  of  subjects,  the  number  of  examinations  per  subject  for  test  responses, 
and  the  total  length  of  a  study  time  period,  this  paper  also  provides  a  general  guideline  about  how 
to  determine  these  to  minimize  the  total  cost  of  a  study  for  a  desired  power  at  a  given  a-level 
A  specified  linear  cost  function  that  incorporates  the  costs  of  obtaining  subjects,  periodic 
examinations  for  test  re^nses  of  subjects,  and  the  total  length  of  a  study  period,  is  assumed, 
primarily  for  illustrative  purpose. 

Key  words:  Interval  censoring;  Linear  cost  function;  Maximum  likelihood 
estimator;  Sample  size  determination;  Hazard  rate. 


I.  Introduction 

Calculating  the  sample  size  required  to  achi^-  '  a  desired  power  at  a  fixed  level 
of  significance,  we  usually  assume  that  wv  jw  the  exact  time  of  all  subject 
responses  that  occur  before  the  end  of  our  ^ludy  period  (Gross  and  Clark, 
1975;  Narula  and  Li,  1975;  Rasch,  1977;  Epstein  and  Sobel,  1953;  George 
and  Desu,  1974).  It  is  very  common,  however,  in  practice  that  we  only  monitor 
subjects  periodically  and,  therefore,  we  know  only  whether  these  subject 
responses  have  occurred  during  a  given  interval  (Kulldorff,  1961;  Cheng  and 
Chen,  1988).  We  call  this  type  of  data  grouped  data,  in  which  the  information 
on  the  exact  time  of  responses  is  unavailable.  For  example,  when  we  want  to 
study  an  asymptomatic  chronic  disease,  continuous  examination  of  all  subjects 
in  order  to  pinpoint  the  exact  response  time  will  often  be  practically  difficult. 
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With  regard  to  the  optimal  design  of  reliability  tests,  numerous  researches 
have  studied  the  properties  of  maximum  likelihood  estimators  (MLEs)  for 
grouped  data  under  different  model  assumptions.  Kulldorff  (1961)  syste¬ 
matically  laid  down  the  fundamental  theory  for  grouped  data  in  exponential 
distributions.  Cheng  and  Chen  (1988)  discussed  the  conditions  for  the  existence 
of  the  maximum  likelihood  estimator  (MLE)  and  proposed  an  alternative 
estimator  to  the  MLE  under  the  Weibull  model  for  grouped  data.  Furthermore, 
for  a  fixed  number  of  examinations  per  subject,  Kulldorff  (1961),  Nelson 
(1977),  Wei  and  Bau  (1987),  and  Wei  and  Shau  (1987)  also  discussed  for  a 
variety  of  distributions  the  optimal  interval  length  between  any  two  consecutive 
examinations  within  the  same  subject  in  order  to  minimize  the  asymptotic 
variance  of  the  MLE.  None  of  these  papers,  however,  addressed  the  optimal 
sample  size  determination  when  the  number  of  subjects,  the  number  of 
examinations  per  subject,  and  the  length  of  a  study  period  are  permitted  to 
vary  simultaneously  with  the  goal  of  minimizing  the  total  costs  incurred  in 
conducting  a  study  with  a  desired  power  1  —  ^  at  the  a-level  of  significance. 

In  this  paper,  we  1rst  present  the  required  number  of  subjects  based  on 
grouped  exponential  observations  for  a  variety  of  parameter  values  in  the 
situation  where  we  have  two  treatment  groups.  This  provides  a  quantitative 
assessment  of  the  grouping  effect  on  required  sample  sizes  that  are  usually 
calculated  with  the  assumption  that  the  subject  response  times  are  known 
exactly.  To  generalize  previous  discussions  on  the  optimality  for  grouped  data 
(Kulldorff,  1961;  Nelson,  1977;  Wei  and  Bau,  1987;  Wei  and  Shau,  1987), 
we  consider  a  linear  cost  function  that  incorporates  the  costs  of  obtaining 
subjects,  periodic  examinations  for  test  responses,  and  the  per  unit  cost  of 
maintaining  the  time  period  of  a  study.  For  a  given  power  and  a  size 
requirement,  we  determine  the  required  number  of  study  subjects,  the  number 
of  periodic  examinations  per  subject,  and  the  length  of  the  study  time  period 
that  minimize  the  total  cost  of  a  study. 


2.  Theory 

Suppose  that  we  have  a  completely  randomized  and  balanced  study  that  is 
designed  to  compare  two  treatment  groups,  of  which  each  has  n  experimental 
subjects.  Suppose  also  that  the  subject  response  times  in  the  standard  treatment 
and  the  experimental  treatment  groups  are  exponentially  distributed  with  hazard 
rates  2,  and  X2,  respectively.  Because  the  standard  treatment  is  replaced  only 
when  A,  >^2,  we  focus  our  discussion  on  testing  the  null  hypothesis  Hq:  A,  =  Aj 
versus  the  alternative  hyp>othesis  H^:Xi>X2.  Furthermore,  we  assume  that 
subject  withdrawal  may  occur  in  both  treatment  groups  with  withdrawal  rates 
y,  and  yj ,  respectively.  To  determine  the  response  time  for  a’  given  subject,  we 
take  K  examinations  that  are  equally  spaced  over  the  entire  study  period  Tq. 
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Let  A  denote  this  conunon  length  in  time  between  any  two  consecutive 
examinations;  i.e.,  A  simply  equals  Tq/K.  Note  that  for  a  fixed  study  period  Tq, 
if  K  were  equal  to  oo,  A  would  equal  0.  This  corresponds  to  the  situation  in 
which  we  have  no  interval  censoring  in  our  data.  Note  also  that  the  loss  in  the 
relative  efficiency  of  using  equidistant  examination  rather  than  the  optimal 
grouping  is  generally  not  much.  This  is  particularly  true  when  the  true  values 
of  X,,  on  which  the  optimal  grouping  strategy  depends,  are  unknown  and  are 
thus  required  to  be  guessed,  and  when  the  number  of  examinations  K  is  so  large 
that  >l,7J,/X<0.8  (Kulldorff,  1961).  Therefore,  we  will  focus  the  following 
discussion  on  the  equidistant  case  only. 

Under  the  above  assumptions,  for  a  given  subject  j,  where  j  =  l,2,...,n,  in 
treatment  group  i  (i  =  1, 2),  the  probability  that  the  response  time  Rfj  falls  in  the 
interval  —  l)d,  kyd),  where  ky  =  1,  2, ... ,  K,  equals 

P(/?y  6  ((ky-  l)d,  kyd))  =  exp(-2,(ky-  l)d)  (1  -CXpf- A.d)).  (1) 

Similarly,  for  the  subject  7,  the  probability  that  withdrawal  time  Wij  falls  in  the 
interval  ((ky  —  l)d,  kyd)  equals 


P{W,j  e  ((ky-  l)d,  kyd))  =  exp(-y,(ky-  l)d)  (1  -exp(-y,d)).  (2) 

Therefore,  the  general  likelihood  Lf  for  n  subjects  in  group  i  (/  =  !,  2)  can  be 
written  as 


^.  =  fl  [e*P(--li(*o-l)d)(l-exp(-2.d))exp(-y,(ky-l)d)]'‘> 

;=i 

[exp  ( -  y,(ky  - 1 )  d)  ( 1  -  exp  ( -  yj  d))  exp  ( -  A(  (ky  - 1 )  d)]'u 
[exp(-(>l,  +  y,)Xd)]‘-‘'>-'u  (3) 

where 


1  if  subject  j  in  group  i  responded  before  withdrawal  and  the  end 
of  our  study  period  Tq, 


0,  otherwise. 


and 


1  if  subject  j  in  group  i  withdrew  before  response  and  the  end 
of  our  study  period  Tq  , 


0,  otherwise. 
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On  the  basis  the  above  likelihood  (3),  we  can  obtain  the  MLE  of  by  solving 
the  following  equation: 


d\ogLi 

dXi 


=  I 


I 


-[^0+515]  [feu-l]^+5o 

-Xd(l-5y-6*)  =  0. 


A  cxp{—XiA) 
(l-expf-Ajd)) 


This  leads  the  MLE  of  A,  to  be 


X 


+ srj)  {k,j  -i)+Kii-s,j-  sTjn 


)■ 


(4) 


(5) 


Furthermore,  because 


—d^  logL 
3A? 


and  because 


£(5;)  = 


-l.  +  yi 


{l-cxpC-a..  +  y,)ro]}, 


the  asymptotic  variance  of  log(X,)  is 


Var(log(X,))  = 


F(A,.  y,.  To,  K) 
n 


(6) 


where 


t'Ui.  Vi.  To,  K)  =  |~—  [>  -exp(-(A,  +  yi)ro)]j 

I'tj  +  yi  j 

(l-exp(-A,To/K))^ 
exp(-A,To/X)(A.To//C)^  ’ 

Note  that  when  yj  =  0  (i.e.,  there  is  no  subject  withdrawal),  the  MLE  Af  and  the 
asymptotic  variance  in  formulae  (S)  and  (6)  reduce  to  those  derived  by  Kull- 
DORFF  (1961)  and  Nelson  (1977).  Note  also  that  for  a  fixed  ratio  A,To,  between 
the  study  period  and  the  mean  response  times,  variance  formula  (6)  is  an  in¬ 
creasing  function  of  the  loss  rate  y,.  and  is  a  decreasing  function  of  the  number  K 
of  examinations  per  subject.  Furthermore,  if  K  increased  to  oo, 

(l-exp(-A,7o//C))^ 

cxp(-A,ro/K)(A,To/K)^ 

would  decrease  to  1.  Therefore,  the  variance  formula  (6)  will  reduce  to  the  same 
as  that  when  we  know  the  exact  response  times  that  occur  before  Tq. 
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Consider  testing  the  null  hypothesis  Ho:Xi  —  X2  versus  the  alternative 
hypothesis  H,:  Xi/X^  =  J?  >  1  with  a  desired  power  1  —  ^  at  a-level.  In  developing 
sample  size  formulae,  as  frequently  assumed  elsewhere  (Taulbee  and  Symons, 
1983;  Gross,  Hung,  Cantor,  and  Clark,  1987),  we  rely  on  the  as]rtnptotic 
normality  of  the  MLE.  On  the  the  basis  of  the  above  formula  (6),  for  a  fixed 
study  period  Tq  and  for  a  given  number  K  of  examinations  per  subject,  the 
required  number  of  subjects,  n,  is  then  given  by  the  smallest  integer  larger  than 


(log(/l))* 


where  X  =  X,  (1  +  l/l?)/2,  and  where  Z,  and  are  the  upper  lOOath  and  100 ^th 
percentiles  of  a  standard  normal  distribution,  respectively. 

Let  Cj  and  C2  denote  the  costs  per  subject  and  per  examination,  respectively. 
Furthermore,  let  C3  denote  the  cost  per  unit  time  during  our  study  period  Tq. 
Therefore,  the  total  cost  for  a  treatment  group  in  the  study  under  consideration 
here  is  given  by  2  C,  x  n  +  2  C2  x  n  x  X  +  C3  x  To .  To  detect  the  treatment  effect 
R  =  >  1  at  a-level  with  a  desired  power  1  —  /?,  we  want  to  find  n,  K  and  Tq 

that  minimize  the  above  linear  cost  function  subject  to  the  following  constraints; 
that  (i)  n  and  K  are  integers,  and  (ii) 

=  -Z,.  (8) 

We  apply  the  IMSL  subroutine  DNCONF  to  obtain  these  optimal  solutions  as 
follows:  We  employ  a  sequential  procedure  in  which  we  first  solve  the 
optimization  problem  for  minimizing  the  above  linear  cost  function  subject  to 
the  constraint  (8),  bu*.  without  imposing  tlie  integer  constraints.  Then,  the  value 
of  K  is  fixed,  in  turn,  at  the  greatest  integer  smaller  than  unconstrained  value, 
and  the  smallest  integer  greater  than  the  unconstrained  value.  The  subsequent 
two-dimensional  problem  (involving  n  and  To)  is  solved,  again  without  con¬ 
straining  n  to  be  integer-valued.  The  value  for  n  is  then  fixed  at  the  smallest 
integer  larger  than  its  unconstrained  value,  and  the  value  of  Tq  is  found  by 
solving  the  subsequent  one-dimensional  problem.  This  procedure  is  repeated 
with  the  roles  of  K  and  n  reversed.  Of  the  resulting  four  sets  of  values  for 
(X,  n,  Tq),  the  set  with  the  smallest  cost  is  selected  as  our  final  optimal  solution. 

3.  Results 

To  study  the  possible  loss  of  efficiency  resulting  from  grouping,  we  summarize 
the  required  sample  sizes  for  the  power  of  0.90  at  O.OS  level  calculated  by  using 
sample  size  formula  (7).  Tables  1  and  2  present  the  results  for  the  hazard  rate  A, 
in  the  standard  treatment  group  ranging  from  0.015  to  0.030,  the  fixed  time 


Z.l/y(X,  Tq,  X)-!-  F(A,  y2,  Tq,  X)  -log(R)  jA 
l/^'(A,.  y,.  To,  X)  +  F(A,/R.  72,  To,  X) 
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Study  period  Tq  ranging  from  50  to  400,  the  treatment  effect  R,  ranging  from 
2  to  4,  the  number  of  observations  K  per  subject  ranging  from  1  to  infinity,  and 
for  the  common  loss  rate  y  equal  to  0.0  and  0.01.  For  example,  when  the  loss 
rate  y  =  0.00,  the  hazard  rate  2,  =0.015,  the  study  period  To  =  50,  and  R  from 
2  to  4  (Table  1),  taking  one  or  two  observations  per  subject  is  almost  as  efficient 
as  taking  infinitely  many  observations  per  subject  (i.e.,  knowing  the  exact  time 
of  test  response),  as  evidenced  by  the  fact  that  the  required  sample  size  is 
virtually  constant  over  all  values  of  K.  This  is  true  for  the  same  configuration 
as  above  with  the  loss  rate  equal  to  0.01  (Table  2).  By  contrast,  when  the  hazard 
rate  is  =0.030  and  the  study  period  increases  to  400,  the  loss  of  efficiency 
due  to  grouping  is  substantial  and  can  remain  as  large  as  20  to  30%  (Tables  1 
and  2),  even  when  taking  as  many  as  5  examinations  per  subject. 


Table  I 

Required  sample  sizes  for  the  wwer  of  0.90  at  O.OS  level  (one-sided  test)  based  on  grouped 
exponential  observations  with  hazard  rate  in  the  standard  treatment  group,  i,,  ranging 
(rom  0.01  S  to  0.030,  the  fixed  lime  study  period  To,  ranging  from  SO  to  4(X},  the  treatment 
elTect  R  {  =  ranging  from  2  to  4,  and  the  number  of  examinations  per  subfect,  K, 

ranging  from  I  to  oo  at  the  loss  rale,  y,  equal  to  0.(X)  per  unit  of  time. 


To 

R 

K  =  i 

2 

3 

4 

5 

10 

00 

so 

2 

89 

87 

87 

87 

87 

87 

87 

3 

41 

41 

41 

41 

41 

41 

41 

4 

29 

29 

29 

29 

29 

29 

29 

100 

2 

61 

S6 

56 

55 

55 

55 

55 

3 

27 

2S 

25 

25 

25 

25 

25 

4 

19 

18. 

18 

17 

17 

17 

17 

200 

2 

63 

46 

43 

42 

42 

41 

41 

3 

2S 

19 

19 

18 

18 

18 

18 

4 

16 

13 

13 

12 

12 

12 

12 

400 

2 

190 

57 

45 

41 

39 

37 

37 

3 

6S 

22 

18 

17 

16 

16 

15 

4 

39 

14 

12 

II 

11 

10 

10 

so 

2 

66 

62 

62 

61 

61 

61 

61 

3 

30 

28 

28 

28 

28 

28 

28 

4 

21 

20 

20 

20 

20 

20 

20 

100 

2 

S8 

47 

45 

44 

44 

44 

44 

3 

24 

20 

20 

19 

19 

19 

19 

4 

16 

14 

13 

13 

13 

13 

13 

200 

2 

119 

so 

43 

40 

39 

38 

37 

3 

43 

20 

18 

17 

16 

16 

16 

4 

26 

13 

11 

11 

II 

II 

10 

400 

2 

2194 

116 

62 

49 

44 

38 

36 

3 

693 

41 

23 

19 

17 

15 

15 

4 

396 

25 

14 

12 

II 

10 

10 

400 
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(Continuation)  Table  1 


To 

R 

K  =  l 

2 

3 

4 

5 

10 

00 

0.030 

50 

2 

61 

56 

56 

55 

55 

55 

55 

3 

27 

25 

25 

25 

25 

25 

25 

4 

19 

18 

18 

17 

17 

17 

17 

100 

2 

63 

46 

43 

42 

42 

41 

41 

3 

25 

19 

19 

18 

18 

18 

18 

4 

16 

13 

13 

12 

12 

12 

12 

200 

2 

190 

57 

45 

41 

39 

37 

37 

3 

65 

22 

18 

17 

16 

16 

16 

4 

39 

14 

12 

11 

11 

10 

10 

400 

2 

9227 

188 

77 

56 

48 

39 

36 

3 

288g 

64 

28 

21 

19 

16 

15 

4 

1651 

38 

17 

13 

12 

10 

10 

Table  2 

Required  sample  sizes  for  the  power  of  0.90  and  O.OS  level  (one-sided  test)  Based  on 
grouped  exponential  observations  with  hazard  rate,  2,,  in  the  standard  treatnient  group 
ranging  from  0015  to  0.030,  the  fixed  time  study  period  7^,  ranging  from  50  to  400,  the 
treatment  effect  R  i—ii/ii),  ranging  from  2  to  4,  and  the  number  of  examinations  per 
subject,  K,  ranging  from  1  to  oo  at  the  loss  rate,  y,  equal  to  0.01  per  unit  of  time. 
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(Continuation)  Table  2 
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Table  3 

Optimal  results  about  length  of  study  period.  To,  sample  sizes  of  subjects,  n,  and  the 
number  of  examinations  per  subject,  K,  for  the  power  of  0.90  at  O.OS  Level  (one-sided  test) 
based  on  grouped  exponential  observations  with  hazard  rate,  2,,  in  the  standard  treatment 
group  ranging  from  O.OIS  to  0.030,  and  the  treatment  effect  R  ranging  from  2 


to  4  at  the  loss  rate,  y,  equal  to  0.00. 
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Table  4 

Optimal  results  about  length  of  study  period,  Tg,  sample  sizes  of  subjects,  n,  and  the 
i  number  of  examinations  per  subject,  fC,  for  the  power  of  0.90  at  O.OS  level  (one-sided  test) 

j  based  on  grouped  exponential  observations  with  hazard  rate,  A,,  in  the  standard  treatment 

I  group  ranging  from  O.OIS  to  0.030,  and  the  treatment  effect  R  ranging  from  2 


to  4  at  the  mean  loss  rate,  y,  equal  to  0.01  per  unit  of  time. 
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We  next  address  the  problem  of  minimizing  total  study  cost  subject  to  integer 
constraints  on  n  and  K  and  the  inference  constraint  (8)  for  a  given  power  of  0.90 
at  0.05  level  of  significance.  Tables  3  and  4  summarize  the  results  by  giving  the 
study  periods,  Tq,  the  required  number  of  subjects,  n,  and  the  number  of 
examinations  per  subject,  K,  for  the  subject/examination  unit  cost  ratio,  CJCi, 
ranging  from  1  to  20,  the  time/examination  unit  cost  ratio,  C3/C2 ,  ranging  from 
0  to  10,  the  basic  hazard  rate,  2i,  ranging  from  0.015  to  0.030,  the  treatment 
effect,  R,  ranging  from  2  to  4,  and  loss  rates;  y,  equal  to  0.00  and  0.01.  For 
example,  when  C,/C2  =  10,  2,  =0.030,  and  /?  =  3,  the  optimal 

study  period  Tq,  the  required  number  of  subjects,  n,  and  the  number  of 
examinations  K  per  subject  are  80.9,  20,  and  2  (Table  3).  In  this  situation  but 
for  the  loss  rate  y  =  0.01  (Table  4),  these  optimal  solutions  for  7^,  n,  and  K 
become  63.8,  28,  and  2,  respectively. 


4.  Discussion 


If  we  know  the  exact  times  of  the  test  responses  (or  equivalently,  the  number  of 
examinations,  K,  per  subject  is  infinite),  then  since 

Var  (log  (2,))  =  [ i  _  cxp(  -  (2,  +  y,)  To)] 

H.  +  Vi 


35(1993)6 


687 


is  a  decreasing  function  of  the  study  time  period  7^,  the  required  sample  size 
always  decreases  as  Tq  increases  (Tables  1  and  2).  This  is  not  true,  however,  for 
grouped  data.  In  fact,  for  a  fixed  number  of  observations  k  per  subject,  increasing 
the  length  of  study  period  Tq  alone  can  conversely  lead  to  requiring  a  larger 
sample  size  of  subjects  (Tables  1  and  2).  This  increase  can  be  substantial, 
especially  when  the  ratio  of  the  length  of  study  period  to  the  mean  response 
time  AiTq  is  large  and  K  is  small.  Therefore,  when  we  have  grouped  data  and 
when  the  underlying  test  responses  have  a  short  mean  lifetime,  we  should 
consider  increasing  the  number  of  observations,  K,  per  subject  before  increasing 
the  study  period  to  improve  the  precision  of  the  MLE.  On  the  other  hand,  if 
Xi  7|)  or  7^  is  small  (Tables  1  and  2),  the  required  number  of  subjects  for  K  =  2 
is  essentially  equal  to  that  for  X  =  oo. 

From  both  Tables  3  and  4,  the  optimal  number  of  examinations,  K,  increases 
with  an  increase  of  the  relative  costs  Cj/Cj.  This  is  consistent  with  our  intuition, 
because  if  the  cost  of  obtaining  a  subject  were  relatively  larger  than  that  of 
obtaining  an  examination,  it  would  be  certainly  wise  to  increase  the  number  of 
examinations  per  subject  in  order  to  reduce  the  required  number  of  subjects  to 
reach  a  desired  power.  The  optimal  number  of  examinations,  K,  however, 
decreases  with  an  increase  of  C3/C2.  This  can  be  interpreted  as  a  result  that 
when  the  cost  of  per  unit  study  period  is  relatively  high,  it  may  be  more 
economical  to  increase -the  number  of  subjects  and  reduce  the  study  period  Tq. 
Therefore,  in  this  situation  in  which  Tq  is  small,  as  noted  before,  taking  one  or 
two  examinations  is  as  eflicient  as  taking  infinitely  many  (K  =  00)  examinations. 
In  other  words,  increasing  the  number  of  examinations  per  subject,  K,  does  not 
significantly  reduce  the  asymptotic  variance.  Note  also  that  even  when  the 
relative  cost  C1/C2  is  as  large  as  20,  the  optimal  number  of  examinations  per 
subject,  K,  is  not  bigger  than  4  (Tables  3  and  4).  This  suggests  that,  in  practice, 
we  may  seldom  need  to  take  more  than  4  observations  per  subject. 

When  the  loss  rate  y  is  greater  than  0,  there  is  nonzero  probability  that  a 
given  subject  will  drop  out  before  the  end  of  our  study  and  this  probability  will 
increase  with  the  length  of  the  study  period.  Therefore,  comparing  the  results  in 
Tables  (y  =  0)  with  those  in  Table 4  (y  =  0.01),  we  will  increase  the  required 
number  of  subjects  rather  than  increase  the  length  of  the  study  period  in  order 
to  compensate  for  the  missing  information  resulting  from  those  drop-out 
subjects.  In  fact,  we  can  see  that  the  optimal  length  of  our  study  periods.  To  in 
Table  4,  generally  are  shorter  than  the  corresponding  ones  in  Table  3  and  so  are 
the  optimal  number  of  examinations  per  subject. 

In  summary,  in  this  paper,  we  have  derived  a  general  sample  size  formula  for 
the  required  number  of  subjects  for  grouped  exponential  data.  We  have 
quantitatively  studied  the  grouping  effect  directly  on  the  required  sample  sizes 
of  subjects  in  a  variety  of  situations.  We  also  have  included  a  discussion  on  the 
optimal  sample  allocation  and  the  optimal  length  of  the  study  period  under  a 
linear  cost  function. 
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